Turkish word segmentation using morphological analyzer

نویسندگان

  • M. Oguzhan Külekci
  • Mehmed Özkan
چکیده

This paper describes an algorithm to segment an input Turkish string without any spaces, which may be an output of a speech-to-text application, into words by using morphological analyzer. It is quite possible to use the algorithm on other languages, which has a morphological analysis component, as well. Turkish morphological analyzer is designed and implemented as the linguistic engine of the algorithm. The construction of the analyzer proposes a technique that attempts to achieve group vise morpheme recognition instead of searching suffixes one by one in a word.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A set of open source tools for Turkish natural language processing

This paper introduces a set of freely available, open-source tools for Turkish that are built around TRmorph, a morphological analyzer introduced earlier in Çöltekin (2010a). The article first provides an update on the analyzer, which includes a complete rewrite using a different finite-state description language and tool set as well as major tagset changes to comply better with the state-of-th...

متن کامل

Building a Turkish ASR system with minimal resources

We present an open-vocabulary Turkish news transcription system built with almost no language-specific resources. Our acoustic models are bootstrapped from those of a well trained source language (Italian), without using any Turkish transcribed data. For language modeling, we apply unsupervised word segmentation induced with a state-of-the-art technique (Creutz and Lagus, 2005) and we introduce...

متن کامل

A Paradigm-Based Finite State Morphological Analyzer for Marathi

A morphological analyzer forms the foundation for many NLP applications of Indian Languages. In this paper, we propose and evaluate the morphological analyzer for Marathi, an inflectional language. The morphological analyzer exploits the efficiency and flexibility offered by finite state machines in modeling the morphotactics while using the well devised system of paradigms to handle the stem a...

متن کامل

A Trie-Structured Bayesian Model for Unsupervised Morphological Segmentation

In this paper, we introduce a trie-structured Bayesian model for unsupervised morphological segmentation. We adopt prior information from different sources in the model. We use neural word embeddings to discover words that are morphologically derived from each other and thereby that are semantically similar. We use letter successor variety counts obtained from tries that are built by neural wor...

متن کامل

A pronunciation lexicon for turkish based on two-level morphology

This paper describes the implementation of a full-scale pronunciation lexicon for Turkish based on a two-level morphological analyzer. The system produces at its output, a parallel representation of the pronunciation and the morphological analysis of the word form so that morphological disambiguation can be used to disambiguate pronunciation when necessary. The pronunciation representation is b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001